In [3]:

    
from IPython.core.display import HTML
import pandas as pd
import numpy as np
HTML('''
<style>
.videoWrapper {
	position: relative;
	padding-bottom: 56.25%; /* 16:9 */
	padding-top: 25px;
	height: 0;
}
.videoWrapper iframe {
	position: absolute;
	top: 0;
	left: 0;
	width: 100%;
	height: 100%;
}
p {
    word-break: break-all;
    white-space: normal;
}
div.prompt {display:none}
div.cell { /* Tunes the space between cells */
margin-top:1em;
margin-bottom:1em;
}
div.text_cell_render h1 { /* Main titles bigger, centered */
font-size: 2.2em;
line-height:1.4em;
text-align:center;
}
div.text_cell_render h2 { /*  Parts names nearer from text */
margin-bottom: -0.4em;
}
p.fontTitle { /* Customize text cells */
font-family: 'Times New Roman';
font-size:2.5em;
line-height:1em;
padding-left:0em;
padding-right:0.5em;
}
p.fontReg { /* Customize text cells */
font-family: 'Times New Roman';
font-size:1.25em;
line-height:1em;
padding-left:3em;
padding-right:1em;
}
p.Under { /* Customize text cells */
font-family: 'Times New Roman';
font-size:0.75em;
line-height:2em;
padding-left:5em;
padding-right:3em;
}
</style>
<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Show/hide code"></form>
<p class='fontTitle'>Neural Networks with Tensorflow and other algorithms.\n</p>
<br>
<p class='fontReg'>Basis: This project deals with Decision trees (also known as forests), Linear regression, Neural Networks, and comparisions of the three, along with optimization for the problem set at hand.</p>
<p class='fontReg'>Preprocessing: All the data is preprocessed through converting all terms to numbers, so as to make it possible for a mathematical formula can be run on it, with fig1 showing the inputs and outputs:</p>
<p class='fontReg'>   Unprocessed data:</p>
''')









    Out[3]:








Neural Networks with Tensorflow and other algorithms.



Basis: This project deals with Decision trees (also known as forests), Linear regression, Neural Networks, and comparisions of the three, along with optimization for the problem set at hand.
Preprocessing: All the data is preprocessed through converting all terms to numbers, so as to make it possible for a mathematical formula can be run on it, with fig1 showing the inputs and outputs:
   Unprocessed data:



In [ ]:



In [57]:

    
data = pd.read_csv("pima.csv",index_col=0,delimiter=',')
data.head(10)









    Out[57]:







  
    
      
      AGE
      WORKCLASS
      FNLWGT
      EDUCATION
      EDUCATION_NUM
      MARITAL_STATUS
      OCCUPATION
      RELATIONSHIP
      RACE
      SEX
      CAPITAL_GAIN
      CAPITAL_LOSS
      HOURS_PER_WEEK
      NATIVE_COUNTRY
      INCOME
    
    
      INDEX
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1
      39
      State-gov
      77516
      Bachelors
      13
      Never-married
      Adm-clerical
      Not-in-family
      White
      Male
      2174
      0
      40
      United-States
      <=50K
    
    
      2
      50
      Self-emp-not-inc
      83311
      Bachelors
      13
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      0
      0
      13
      United-States
      <=50K
    
    
      3
      38
      Private
      215646
      HS-grad
      9
      Divorced
      Handlers-cleaners
      Not-in-family
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      4
      53
      Private
      234721
      11th
      7
      Married-civ-spouse
      Handlers-cleaners
      Husband
      Black
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      5
      28
      Private
      338409
      Bachelors
      13
      Married-civ-spouse
      Prof-specialty
      Wife
      Black
      Female
      0
      0
      40
      Cuba
      <=50K
    
    
      6
      37
      Private
      284582
      Masters
      14
      Married-civ-spouse
      Exec-managerial
      Wife
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      7
      49
      Private
      160187
      9th
      5
      Married-spouse-absent
      Other-service
      Not-in-family
      Black
      Female
      0
      0
      16
      Jamaica
      <=50K
    
    
      8
      52
      Self-emp-not-inc
      209642
      HS-grad
      9
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      0
      0
      45
      United-States
      >50K
    
    
      9
      31
      Private
      45781
      Masters
      14
      Never-married
      Prof-specialty
      Not-in-family
      White
      Female
      14084
      0
      50
      United-States
      >50K
    
    
      10
      42
      Private
      159449
      Bachelors
      13
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      5178
      0
      40
      United-States
      >50K



In [47]:

    
HTML('''<br>
<p class='fontReg'>   Processed data:</p>''')









    Out[47]:




   Processed data:



In [58]:

    
data2= pd.read_csv("Processed_DATA.csv",index_col=0,delimiter=',')
data2.head(10)









    Out[58]:







  
    
      
      AGE
      WORKCLASS
      FNLWGT
      EDUCATION
      EDUCATION_NUM
      MARITAL_STATUS
      OCCUPATION
      RELATIONSHIP
      RACE
      SEX
      CAPITAL_GAIN
      CAPITAL_LOSS
      HOURS_PER_WEEK
      NATIVE_COUNTRY
      INCOME
    
    
      INDEX
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1
      39
      0
      77516
      0
      13
      0
      0
      0
      0
      0
      2174
      0
      40
      0
      0
    
    
      2
      50
      1
      83311
      0
      13
      1
      1
      1
      0
      0
      0
      0
      13
      0
      0
    
    
      3
      38
      2
      215646
      1
      9
      2
      2
      0
      0
      0
      0
      0
      40
      0
      0
    
    
      4
      53
      2
      234721
      2
      7
      1
      2
      1
      1
      0
      0
      0
      40
      0
      0
    
    
      5
      28
      2
      338409
      0
      13
      1
      3
      2
      1
      1
      0
      0
      40
      1
      0
    
    
      6
      37
      2
      284582
      3
      14
      1
      1
      2
      0
      1
      0
      0
      40
      0
      0
    
    
      7
      49
      2
      160187
      4
      5
      3
      4
      0
      1
      1
      0
      0
      16
      2
      0
    
    
      8
      52
      1
      209642
      1
      9
      1
      1
      1
      0
      0
      0
      0
      45
      0
      1
    
    
      9
      31
      2
      45781
      3
      14
      0
      3
      0
      0
      1
      14084
      0
      50
      0
      1
    
    
      10
      42
      2
      159449
      0
      13
      1
      1
      1
      0
      0
      5178
      0
      40
      0
      1

Neural Networks with Keras as a Tensorflow overlay:

Per the https://www.keras.io website's information on Keras "Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research."

Per the https://www.tensorflow.org website's information on Tensorflow "TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well."

Linear regression:

Arbitrarily draws a line along the data with 1 scalar variable (Y), and multiple other variables (X), with Wikipedia stating:
"Given a data set of n statistical units, a linear regression model assumes that the relationship between the dependent variable yi and the p-vector of regressors xi is linear. This relationship is modeled through a disturbance term or error variable εi — an unobserved random variable that adds noise to the linear relationship between the dependent variable and regressors. Thus the model takes the form," this is a good video on the topic:

Decision trees:

Makes decisions in a "tree" as per wikipedia "A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules," with this video covering the basics of the decision tree:

Results:

So why use the different methods?

Because generally neural networks can be more accurate once tuned and tweaked, but can generally take longer to get certain things right, unlike linear regression, or a decision tree. With linear regression taking around 1/100th of the time that my neural network takes to complete, with the linear regression gaining around 10% accuracy over the neural network. The same applies to the decision tree, however the decision tree is more noisy than the linear regression at all points in the testing, but can get around 1%-2% higher accuracy over the linear regression also. - this is related to the "No Free lunch" Theorum, which in a nutshell states that no one algorithm works best for every problem.

Conclusion:

As shown in the graph above, the linear regression appears to be working the best due to its consistent, high (82% accuracy), rather than the much more noisy decision tree, and the neural network with tensorflow, which only was in the 70% area.

https://www.tensorflow.org

https://www.keras.io

https://en.wikipedia.org/wiki/No_free_lunch_theorem

https://en.wikipedia.org/wiki/Linear_regression

https://en.wikipedia.org/wiki/Decision_tree



In [ ]:

	AGE	WORKCLASS	FNLWGT	EDUCATION	EDUCATION_NUM	MARITAL_STATUS	OCCUPATION	RELATIONSHIP	RACE	SEX	CAPITAL_GAIN	CAPITAL_LOSS	HOURS_PER_WEEK	NATIVE_COUNTRY	INCOME
INDEX
1	39	State-gov	77516	Bachelors	13	Never-married	Adm-clerical	Not-in-family	White	Male	2174	0	40	United-States	<=50K
2	50	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	13	United-States	<=50K
3	38	Private	215646	HS-grad	9	Divorced	Handlers-cleaners	Not-in-family	White	Male	0	0	40	United-States	<=50K
4	53	Private	234721	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	0	40	United-States	<=50K
5	28	Private	338409	Bachelors	13	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0	0	40	Cuba	<=50K
6	37	Private	284582	Masters	14	Married-civ-spouse	Exec-managerial	Wife	White	Female	0	0	40	United-States	<=50K
7	49	Private	160187	9th	5	Married-spouse-absent	Other-service	Not-in-family	Black	Female	0	0	16	Jamaica	<=50K
8	52	Self-emp-not-inc	209642	HS-grad	9	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	45	United-States	>50K
9	31	Private	45781	Masters	14	Never-married	Prof-specialty	Not-in-family	White	Female	14084	0	50	United-States	>50K
10	42	Private	159449	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	5178	0	40	United-States	>50K